Skip to content

Conversation

@xqvvu
Copy link
Contributor

@xqvvu xqvvu commented Nov 17, 2025

Note

Adds S3-backed storage for dataset files/images with presigned upload/download, JWT proxy, and end-to-end UI/backend changes while keeping GridFS compatibility.

  • Storage/S3 Integration:
    • Introduces S3DatasetSource (upload by buffer, get/put/stat, delete by key/prefix, list) and TTL utils; supports dataset/chat/avatar buckets.
    • Adds JWT-signed proxy endpoint GET /api/system/file/[jwt] to stream S3 objects.
    • Handles mixed keys: recognizes S3 dataset/... vs GridFS IDs across delete/read paths.
  • APIs:
    • New endpoints: POST /core/dataset/presignDatasetFilePostUrl, POST /core/dataset/presignDatasetFileGetUrl (zod schemas in global/core/dataset/v2/api.ts).
    • Updates dataset create/import/image upload flows to S3 (presigned POST), including parsing-uploaded images and removing TTLs after use.
    • Read/parse pipeline now uploads parsed images to S3 and replaces refs; link/file readers adapted to S3 buffers.
  • Dataset Processing:
    • Collection parsing/vector queues adapted for S3 files; manage TTLs and parsed-image cleanup by prefix; support image collections via S3.
    • Read/source sync and preview chunk generation accept datasetId and S3 keys.
  • Permissions/Auth:
    • File auth supports S3 dataset keys; dataset citations/images use signed URLs; adds JWT sign/verify helpers.
  • Schema/Types:
    • fileId in collections is now string (S3 key or GridFS id); collection file metadata simplified to { filename?, contentLength? }.
    • Adds zod schemas/types for presign params and S3 helpers.
  • Frontend:
    • Markdown image component resolves dataset/... and chat/... via presigned URLs.
    • Dataset import (local) uploads via presigned POST; metadata card shows decoded filename and S3 content length; open-source now uses presigned get.
  • Misc:
    • Removes Mongo-only read path; refactors S3 base bucket (copy/delete, put/get/stat, presign policy metadata); i18n strings updated.

Written by Cursor Bugbot for commit 31bfab7. This will update automatically on new commits. Configure here.

@gru-agent
Copy link
Contributor

gru-agent bot commented Nov 17, 2025

There is too much information in the pull request to test.

@github-actions
Copy link

github-actions bot commented Nov 17, 2025

Preview mcp_server Image:

registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-pr:fatsgpt_mcp_server_65d4b2ded339aed02c0c8fc98da23635c8b7edfb

@github-actions
Copy link

github-actions bot commented Nov 17, 2025

Preview sandbox Image:

registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-pr:fatsgpt_sandbox_65d4b2ded339aed02c0c8fc98da23635c8b7edfb

@xqvvu xqvvu force-pushed the v4.14.2-dev branch 2 times, most recently from 7b834ca to bb0d038 Compare November 17, 2025 09:57
@github-actions
Copy link

github-actions bot commented Nov 17, 2025

Preview fastgpt Image:

registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-pr:fatsgpt_65d4b2ded339aed02c0c8fc98da23635c8b7edfb

@c121914yu c121914yu force-pushed the v4.14.2-dev branch 3 times, most recently from 4d4d1b4 to 00b5f2e Compare November 18, 2025 10:03
@github-actions
Copy link

github-actions bot commented Nov 18, 2025

Docs Preview:


🚀 FastGPT Document Preview Ready!

🔗 👀 Click here to visit preview

@c121914yu c121914yu force-pushed the v4.14.2-dev branch 2 times, most recently from 22b70f6 to 814075d Compare November 18, 2025 11:26
@c121914yu c121914yu deleted the branch labring:v4.14.3-dev November 18, 2025 11:27
@c121914yu c121914yu closed this Nov 18, 2025
@c121914yu c121914yu reopened this Nov 19, 2025
@c121914yu c121914yu changed the base branch from v4.14.2-dev to v4.14.3-dev November 19, 2025 01:42
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

@c121914yu c121914yu requested a review from Copilot November 19, 2025 14:54
Copilot finished reviewing on behalf of c121914yu November 19, 2025 14:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements S3-backed storage for dataset files and images with presigned upload/download URLs, JWT-based proxy access, and maintains backward compatibility with GridFS. The implementation introduces a comprehensive S3 integration layer while preserving existing functionality.

Key Changes

  • Adds S3 storage integration with S3DatasetSource class providing upload, download, delete, and metadata operations for dataset files
  • Implements JWT-signed proxy endpoint for secure S3 object streaming and presigned URL generation APIs
  • Updates dataset processing pipeline to handle S3 keys alongside GridFS IDs, including TTL management and parsed image cleanup

Reviewed Changes

Copilot reviewed 64 out of 66 changed files in this pull request and generated 40 comments.

Show a summary per file
File Description
projects/app/src/pages/api/system/file/[jwt].ts New JWT-authenticated proxy endpoint for streaming S3 objects
projects/app/src/pages/api/core/dataset/presignDatasetFilePostUrl.ts API for generating presigned upload URLs with authentication
projects/app/src/pages/api/core/dataset/presignDatasetFileGetUrl.ts API for generating presigned download URLs supporting both collections and direct keys
packages/service/common/s3/sources/dataset/index.ts Core S3DatasetSource class implementing dataset file operations
packages/service/common/s3/utils.ts JWT signing/verification utilities and S3 TTL management helpers
packages/service/core/dataset/read.ts Updated file reading logic to support S3 keys and parsed image uploads
projects/app/src/pageComponents/dataset/detail/Import/diffSource/FileLocal.tsx Frontend upload flow migrated to presigned POST
projects/app/src/components/Markdown/img/Image.tsx Markdown image component with S3 presigned URL resolution
packages/service/core/dataset/collection/schema.ts Schema change: fileId now String type supporting both GridFS IDs and S3 keys
packages/service/core/dataset/collection/controller.ts Collection deletion updated to clean up S3 files and parsed images
packages/web/i18n/*/chat.json Added translation for image collection unsupported error
packages/web/i18n/*/app.json Updated file upload tip to reflect S3 storage behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@c121914yu c121914yu merged commit 68dba75 into labring:v4.14.3-dev Nov 21, 2025
5 checks passed
c121914yu added a commit that referenced this pull request Nov 21, 2025
* fix: text split

* remove test

* feat: integrate S3 for dataset with compatibility

* fix: delay s3 files delete timing

* fix: remove imageKeys

* fix: remove parsed images' TTL

* fix: improve codes by pr comments

---------

Co-authored-by: archer <545436317@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants